This video provides an overview of our method and, most importantly, 
showcases our synthesized results in video format. For each task 
setting, we present both the generated videos from the 4D-aware 
video diffusion model (denoted as Diffusion4D*) and the rendered 
images from explicit 4D construction (denoted as Diffusion4D).

Each generated video comprises 24 frames around a 4D asset, while each 
rendered video consists of 160 frames around the constructed 4D asset. 
We hope you enjoy watching our results! Thank you for your attention.
